Development of a conversational telephone speech recognizer for Levantine Arabic
نویسندگان
چکیده
Many languages, including Arabic, are characterized by a wide variety of different dialects that often differ strongly from each other. When developing speech technology for dialect-rich languages, the portability and reusability of data, algorithms, and system components becomes extremely important. In this paper, we describe the development of a large-vocabulary speech recognition system for Levantine Arabic, which was a new dialectal recognition task for our existing system. We discuss the dialect-specific modeling choices (grapheme vs. phoneme based acoustic models, automatic vowelization techniques, and morphological language models) and investigate to what extent techniques previously tested on other languages are portable to the present task. We present stateof-the-art recognition results on the 2004 Levantine Arabic Rich Transcription evaluation.
منابع مشابه
Developing and Using a Pilot Dialectal Arabic Treebank
In this paper, we describe the methodological procedures and issues that emerged from the development of a pilot Levantine Arabic Treebank (LATB) at the Linguistic Data Consortium (LDC) and its use at the Johns Hopkins University (JHU) Center for Language and Speech Processing workshop on Parsing Arabic Dialects (PAD). This pilot, consisting of morphological and syntactic annotation of approxim...
متن کاملTelephone-based conversational speech recognition in the JUPITER domain
This paper describes our experiences with developing a telephone-based speech recognizer as part of a conversational system in the weather information domain. This system has been used to collect spontaneous speech data which has proven to be extremely valuable for research in a number of different areas. After describing the corpus we have collected, we describe the development of the recogniz...
متن کاملTelephone-based Conversational Speech Recognition in the Jupiter Domain1
This paper describes our experiences with developing a telephone-based speech recognizer as part of a conversational system in the weather information domain. This system has been used to collect spontaneous speech data which has proven to be extremely valuable for research in a number of different areas. After describing the corpus we have collected, we describe the development of the recogniz...
متن کاملDialectal Arabic Orthography-based Transcription
The present paper describes the experience gained at LDC in the collection and transcription of conversational dialectal Arabic. The paper will cover the following: (a) Arabic language background; (b) objectives. principles, and methodological choices of dialectal Arabic transcription, (c) design features of LDC‟s „Arabic MultiDialectal Transcription Tool‟ (AMADAT) and metalanguage transcriptio...
متن کاملReal-time telephone-based speech recognition in the Jupiter domain
This paper describes our experiences with developing a realtime telephone-based speech recognizer as part of a conversational system in the weather information domain. This system has been used to collect spontaneous speech data which has proven to be extremely valuable for research in a number of different areas. After describing the corpus we have collected, we describe the development of the...
متن کامل